-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22763 MF2: Handle time zones correctly in :datetime #3012
base: main
Are you sure you want to change the base?
Conversation
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Previously, the time zone components of date/time literals were ignored. In order to store the time zone properly for re-formatting, this change also changes the representation of `Formattable` date values to use a `GregorianCalendar` rather than a `UDate`. (This is a public API change.)
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
@FrankYFTang setting you as the main reviewer because Mihai is ooo for a while soon. Note that the API change proposal is still under discussion. |
Here is my view
icu::SimpleTimeZone::SimpleTimeZone ( int32_t rawOffsetGMT, |
Maybe something like
? |
Note that MF2 will not stay with only supporting vanilla 8601 strings. Real time zone support is important. An offset is insufficient. |
that is what I asked this morning and the answer I got back is only vanilla 8601 strings. if MF2 is going to support RFC 9557 https://datatracker.ietf.org/doc/html/rfc9557 then we will need to consider not just timezone extension in RFC9557 but also calendar extension in RFC 9557. Therefore, it will be a super bad idea to use GregorianCalendar since the calendar could be in a different calendar system. We should consider the following cases
For 4, we need a string to indicate the real time zone name but not more than that how about
|
I apologize for giving the wrong answer -- I was answering based on the existing spec. @aphillips is more of an authority than me about how the spec will evolve in the future. |
(personal response, chair hat off) The offset is irrelevant. The only thing an offset is useful for is adjusting the date value (the incremental time value--Temporal calls this an The use of non-Gregorian calendars in date/time serializations is more problematic: there's not a lot of implementation experience, particularly with non-binary multi-era calendars. It's usually better (or at least more common) to use a proleptic Gregorian calendar (i.e. 8601) on the wire (especially for incremental time values!) and convert using calendar rules for processing/display. Perhaps this will evolve and mature further? My experience here is limited. There also needs to be a concept of a floating time value (one not tied to specific instants on the timeline). This could use
In any case, this was a significant gap in the MF2 tech preview. It's important for us to add time zone management to MF2 before LDML46 and the ICU4C implementation should be designed/adapted with that in mind. |
Thanks for the feedback, @aphillips and @FrankYFTang . I'll take a look at these options and update the design doc accordingly. My first thought, though, is why not something like:
Why not re-use the existing |
@FrankYFTang I've made changes to match the suggestions; can you take another look? I've also updated the design proposal in the "ICU 76 API proposal status" doc. |
* @internal ICU 76 technology preview | ||
* @deprecated This API is for technology preview only. | ||
*/ | ||
UnicodeString calendarName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why you need createTimeZone() here since all the information you need can be derived from zoneName itself. I will prefer we not adding more function if it is really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included it because to call DateFormat::format()
to eventually format the date, a TimeZone
object has to be constructed (to pass to DateFormat
's adoptTimeZone()
or setTimeZone()
) method. But it doesn't have to be a method of the DateInfo
struct (changed in 1c68d51).
* @internal ICU 76 technology preview | ||
* @deprecated This API is for technology preview only. | ||
*/ | ||
UnicodeString zoneName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateInfo
is movable (because it can appear inline in Formattable
, which is movable), so the members can't be const. I could change DateInfo
to a class and make the members private, if you think that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking GTM, some comments
* @internal ICU 76 technology preview | ||
* @deprecated This API is for technology preview only. | ||
*/ | ||
UnicodeString calendarName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the calendarName supposed to map to? Is it the 'old style name' (gregorian
) matching
icu/icu4c/source/i18n/unicode/calendar.h
Line 1297 in cf7ff1b
virtual const char * getType() const = 0; |
-u-ca
type (gregory
)?
If it's from the locale, wouldn't that be visible from the locale of the message format itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec currently says that it's this, but it's not fully defined in v45 (the calendar option is reserved for future standardization). Probably we want to support whatever JS Temporal does or at least be compatible with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably link there, as a 'calendar type' in ICU4C is one of the old types generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs squash |
Co-authored-by: Steven R. Loomis <[email protected]>
@FrankYFTang Are you OK with the current state of the design doc for this PR? The JIRA ticket shows as "Accepted", but I've made changes to the design doc since your last comment. |
return 0; | ||
} | ||
// 'zzzz' to handle the case where the timezone offset follows the ISO 8601 datetime | ||
if (sourceStr.length() > 25) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain where the 25 came from? How do you conclude it need to > 25? and 25 is enough?
https://unicode.org/reports/tr35/tr35-dates.html#Contents show
zzzz | Pacific Daylight Time | The long specific non-location format. Where that is unavailable, falls back to the long localized GMT format ("OOOO"). |
---|
and
OOOO | GMT-08:00 | The long localized GMT format. |
---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it doesn't really need to check the length at all, just whether the last character is 'Z'. Changed that in c63c6b5.
In 197caef, I added some comments explaining the meaning of the fields in the |
I also removed the |
This needs to be rebased because of #3050 moving where the test data is, but I'll do that at the very end in order to avoid losing review comments. |
if (U_FAILURE(errorCode)) { | ||
return 0; | ||
} | ||
return dateParser->parse(sourceStr, errorCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- dateParser->parse() is a const method, which mean it could be shared across thread, right?
- there are only total 4 calls into tryPattern() in this method
tryPattern("YYYY-MM-dd'T'HH:mm:ss"
tryPattern("YYYY-MM-dd"
tryPattern("YYYY-MM-dd'T'HH:mm:ssZ"
tryPattern("YYYY-MM-dd'T'HH:mm:sszzzz"
So there are total 4 possible pat , right?
Currently, the creation of the DateFormat is inside the tryPattern()
Everytime the tryPattern() got called, the code need to create a SimpleDateFormat and later destroy it
but there are only total 4 possible SimpleDateFormat needed to be created and the usage are thread safe (see my point 1 above)
So... should we have an lazy initialization routine which create these four SimpleDateFormat and cache it globally, and reuse them later. The creation need to be protected to be init once and therefore mutex to prevent different threads create them concurrently, but then we can reuse them across thread and destroy upon lib clean up.
In this way, we do not need to repeatly create and destroy the SimpleDateFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems reasonable. I'll try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done in 8d4da42. I'm not sure of how to unit-test this (i.e. testing that the DateFormats really are initialized only once), but I'll look for examples in the test suite.
if (len <= 6) { | ||
return false; | ||
} | ||
return ((sourceStr[len - 6] == PLUS || sourceStr[len - 6] == HYPHEN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment in the beginning of this function to explain what is the expected input value and the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4a7da29
By the way, I'll be on vacation from now until September 9, so I won't see further comments until I return. |
@@ -57,6 +57,15 @@ namespace message2 { | |||
|
|||
DateTimeType type; | |||
DateTimeFactory(DateTimeType t) : type(t) {} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so.. you store 4 pointer per DateTimeFactory object. But in messageformat2_function_registry.cpp you create multiple DateTimeFactory object, each object will crate the same four DateFormat. But there are no differences between them from the multiple DateTimeFactory . Could you use InitOnce and cleanup code tostore that four DateFormat to be globally shared ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@catamorphism search for icu::UInitOnce
and ucln_common_registerCleanup
for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in da9307c
After implementing that change and running under asan, I noticed a use-after-free error. This required me to change some code in tznames.cpp that dealt with reference counting on TimeZoneNamesDelegate
objects. The deleteTimeZoneNamesCacheEntry()
function in that file was assuming there are no other references to the cache entry. But the new global DateFormat
object holds a reference to a cache entry, and the new cleanup function I added was getting called after the timeZoneNames_cleanup()
function and hence after the cache was already deleted. So I had to change both deleteTimeZoneNamesCacheEntry()
and ~TimeZoneNamesDelegate()
to check the ref counts. I made this a separate patch (for ease of review), 1b8e43f.
@FrankYFTang ping? |
Previously, the time zone components of date/time literals were ignored. In order to store the time zone properly for re-formatting, this change also changes the representation of
Formattable
date values to use aGregorianCalendar
rather than aUDate
.This is a public API change and a design doc can be found in the "ICU 76 API proposal status" document (and was also emailed to the icu-design list on April 29.)
In the TC discussion on May 9, there was some uncertainty about whether to use
GregorianCalendar
orCalendar
as the return type ofmessage2::Formattable::getDate()
. I'm leaving it asGregorianCalendar
for now but can change it if necessary.Checklist